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Detailed Action 
Abstract 

1. The abstract is objected to because of the following minor informality: 

> Page 40, lines 7 and 9^ the phrase "said data" is indefinite. Use the actual 
phrase, such as "transferring data" or "required data" wherever is 
appropriate. Avoid the word "said" in the abstract for better clarity. 
Correction is required. 

Specification 

2. The specification is objected to because of the following minor informalities : 

> Page 2, lines 5-16: the paragraph needs to be revised for clarity and 
better readability. For instance, the sentence on lines 9" 11 is not clear 
and comprehendible. 
The lengthy specification has not been checked to the extent necessary to 
determine the presence of all possible minor errors. Applicants cooperation is 
requested in correcting any errors of which applicant may become aware in the 
specification. 

Claim Objection 

3. Claims 32-35 are objected to because of the following minor informalities ^ 
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With regard to claim 32 - the claim cannot depend on itself, hence 33-35 are 
objected to as being dependent upon a objected base claim, but would be allowable if 
rewritten in independent form including all of the limitations of the base claim and 
any intervening claims. Appropriate dependency and corrections are required. 

Allowance 

4. Claims I'Sl and 36'40 are allowed. The following is an examiner's statement 
of reasons for allowance- 

In reference to claims 1. 23 and 37' Hanson (U.S. Patent 6,378,013) teaches a 
system for assessing performance of a device, such as a hard drive (see Hanson, Fig. 
3 and column 2, lines 21-44). The method includes acts of establishing a virtual 
drive. The processor is configured to transfer data between the virtual drive to the 
hard drive subsystem, and measuring the transfer rate of the subsystem. Another 
embodiment of the system includes a program storage device storing instruction 
when executed by the computer perform a method of measuring the performance of 
the hard drive in a computer. 

However, Hanson does not teach that the system specifies more or more 
different required data transfer rate. The claimed invention further includes a 
method that measures the actual data transfer rate for each access pattern and 
determines performance of the storage device in relation to at least one required 
data transfer rate as a function of the required data transfer rate and the actual 
data transfer time of data for a given access pattern. 
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Citation of pertinent prior art ' 

> Srikrishna et al (IEEE Article, 'Predicting Track Mis-registration 
(TMR) From Disk Vibration of Alternate Substrate Material') teaches a 
method of quantitative measure for closed loop TMR due to disk 
vibration. 

> Pentakalos et al (IEEE Article, 'Analytical Performance Modeling of 
Hierarchical Mass Storage Systems') teaches a queuing network model 
that can be used to carry out capacity planning studies. 

> Getreuer iXi,^, Patent 6,741,529) teaches method and apparatus for 
moving carriage assembly from initial position to target position and 
optical disc system. 

> Smith et al (U.S. Patent 6,546,456) teaches method and apparatus for 
operating vehicle mounted disk drive storage device. 

> Klein (U.S. Patent 5,951,700) teaches method of computer system 
usage determination based on hard disk drive activity. 

> Liu (U.S. Patent 5,768,617) teaches intelUgent hardware for 
automatically reading and writing multiple sectors of data between a 
computer bus and a disk drive. 

> Shimizu et al (U.S. Patent 5,383,068) teaches head position 
recognition method, a speed calculation method, and a head movement 
speed control device. 
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The remaining claims are dependent upon claims 1, 23 and ^7 and contain 
further limitations. 

Conclusion 

5. This application is in condition for allowance except for the objections to the 
abstract, specification and claims as noted above. 

Prosecution on the merits is closed in accordance with the practice under Ex 
parte Quayle, 1935 CD. 11, 453 O.G. 213. 

A shortened statutory period for reply to this action is set to expire TWO 
MONTHS from the maihng date of this letter. 

6. Any inquiry concerning this communication or earlier communications from 
the examiner should be directed to Elias Desta whose telephone number is (571)- 
272-2214. The examiner can normally be reached on M-Thu (8:30-7:00). 

If attempts to reach the examiner by telephone are unsuccessful, the 
examiner's supervisor, Marc S. Hoff can be reached on (57l)-272-2216. The fax 
phone numbers for the organization where this application or proceeding is 
assigned are (703)-308-5841 for regular communications and (703)-308-5841 for 
After Final communications. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is (703)- 
308-1782. 
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Predicting Track Misregistration (TMR) From Disk 
Vibration of Alternate Substrate Materials 

Padmanabhan Srikrishna and Kumar Kasetty 



Abstract — ^Disk vibration is a sigDificant coDtributor to track 
mis-registration (TMR) in todays bigb performance drives with 
tracks per inch (TPl) growing at the rate of more than 60% per 
year and disk rotating speeds of 10000 rpm. Aluminum disk 
substrates result in significantly large TMR that can limit the TPl 
that can be achieved and brace the drive industry is being forced 
to consider altomate disk substrate materials. In this paper we 
identified a quantitative measure called "displacement metric** 
that predicted closely the TMR due to vibration of disk substrate 
materials. The displacement metric depends on substrate material 
properties of Young's Modulus, damping and Poisson's ratio 
and is a better predictor of TMR than traditional measures like 
specific stiffoess. Many substrate vendors are now using this 
metric to improve the performance of new alternate substrate 
materials. Laser doppler vibrometer (LDV) measurements in 
a 10000-RPM disk drive with aluminum, glass, glass-ceramic, 
alumina, silicon carbide and zirconia disks showed that the cumu- 
lative 3-sigma TMR metric due to disk vibration was proportional 
to the reciprocal of the ''displacement metric.** Furthermore, we 
developed a methodology to predict the closed loop TMR due to 
disk vibration. The closed-loop TMR due to disk vibration was 
compared for a range of servo bandwidths (800-1600 Hz) for 
aluminum and alternate substrate materials. This modeling work 
has proven to be useful in the development of new disk substrate 
materials and predicting TMR due to disk vibration for higher 
performance disk drives. 

Index Terms — ^Alternate disk substrates, disk vibration, track 
misregistration. 



I. Introduction 

DEMAND for higher data storage capacity in hard drives 
has led to ever increasing levels of data density on disk 
platters. Current drive designs operating at 1 0000 rpm employ 
high values of Tracks per inch (TPl) to achieve this goal. Inside 
the drive, air turbulence generated by the rotating disk stack is 
responsible for inducing vibration in both the disks and actu- 
ator arm. Due to these combined effects, track misregistration 
(TMR) during read and write operations can reach unacceptable 
levels. Disk vibration is a significant contributor to track mis- 
registration (TMR). In the past, aluminum disk substrates have 
met the TMR requirements. However, with much higher TPl 
and disk rotating speeds now, the drive industry is being forced 
to consider altemate disk substrate materials. Undostanding of 

Manuscript received July 16, 1999. This work was completed u^g 10 000 
rpm preproduction disk drives at Qaantum CorporatioiL Saint Gc4>ain faidustria] 
Ceramics provided samples of silicon carbide, ahimina, and zirconia disk sub- 
strate materials. 

The authors are with the Advanced Technology Group, Quantum Corpo- 
ration, Shrewsbury, MA (e-mail: srikrish@ahiTn Tnitedu; kumarkasctty@ 
quantum.com). 

Pubbsher Item Identifier S 0018-9464(00)00263-6. 



the material properties that contribute to disk vibration is impor- 
tant in the development of higher performance disk substrates. 

In this paper we have identified a quantitative measure called 
"Displacement Metric" that depends only on the substrate ma- 
terial properties of Youngs Modulus, damping and Poisson's 
ratio. The displacement metric must be maximized in order to 
reduce disk vibration. The damping ratio in this metric was es- 
timated fi-om the Quality ((J)-factor corresponding to the fun- 
damental firequency in the displacement spectrum of the disk. 
We conducted laser doppler vibrometer (LDV) measurements 
of disks made of aluminum and altemate substrate materials in a 
1 0 000-rpm disk drive. The substrate materials studied were alu- 
minum, glass, glass-ceramic, alumina, silicon carbide and zir- 
conia disks. The measured axial displacements were converted 
to radial track misregistration (TMR) in % data track units. The 
area beneath the power spectral density was computed to deter- 
mine the cumulative 3-sigma TMR metric due to disk vibration. 
Our results indicated that the reciprocal of the "displacement 
metric" closely matched this open loop TMR metric due to disk 
vibration. 

Disk vibration is one of the sources of position-error during 
read and write operations in a drive. Previous work has failed to 
consider how performance of disk substrate materials is altered 
within a closed loop system for a set of actuator plant dynamics 
and servo compensator design. This paper models the disk vi- 
bration as a disturbance to closed loop system and determines 
the closed loop TMR for a series of servo compensators with 
varying servo bandwidths. The closed loop TMR is compared 
for aluminum and altemate substrate materials. 



n. Reduction of Disk Vibration: Theory 

This section develops a theoretical model to understand the 
various factors that contribute to disk vibration in a drive. For 
given disk dimensions, this model determines the substrate ma- 
terial properties that influence disk vibration. 

Each vibration mode of disk has m nodal circles and n nodal 
diameters and is designated by (m, n). An approximate rela- 
tionship between disk vibration amplitude and substrate mate- 
rial properties and geometry is found from the elastic theory of 
thin annular disks [1]. The natural fr^uency cj of an annular 
disk that is clamped at the inner diameter and free at the outer 
diameter is: 



~ l-KO? Y 12(1 - 7;2) 
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TABLE I 
Modal Parameter "Lemda-Square'' 

FOR DlFFEREfH' VlBRATrON MODES OF ANNULAR DISK FROM 

Plate Theory [l] 



Radia] Mode 


Drcum. Mode 


Dimensionless Parameter lemda-square 


{m) 


In) 








Wa=0.7 


0 


0 


423 


6.66 


13 


37 


0 


1 


3.14 


6.33 


ia33 


37.5 


0 


2 


5.62 


7.95 


14.7 


39.3 


0 


3 


12.4 


13.3 


18.5 


42.6 



where 
Young's Modulus E 
Density p 
Poisson's Ratio 
Disk outer radius a 
Disk clamping 6 
radius 

Disk thickness h 
Mode (m, n) 

Dimensionless A^„, (refer Table I) 
modal parameter 
(lemda-square) 

The maximum vibration amplitude (X) is approximated by 
a 2nd order, 1-D, mass-spring-damper system [4] as 



X = 



Fo 



A:[(l-(u;/a;n)2)2 + (2^a;/a;„)2]i^2 



(2) 



where, 

u)n undamped natural frequency = y/k/m 

m = disk mass = fm{a? —b'^)h 
^ damping ratio. 

Thus at resonance, the maximum displacement is 



X.= 



(3) 



Replacing the frequency term from (I), we get (4), which can 
be written as a multiple of two terms. 



Xr OC 





[(l--)l 







(4) 



Thus the vibration amplitude is decreased by 

1) By modifying disk dimensions and clamping conditions, 
i.e., by reducing disk radius (a); increasing disk thickness 
{h) or increasing the clamping diameter (6) to increase the 
lemda-square parameter. 

2) By using a substrate material with larger Young's Mod- 
ulus {E), damping ratio (0 and Poisson's ratio. These 
parameters form the displacement metric expressed in 
the equation below. The displacement metric must be in- 
creased to decrease vibration amplitude. 



Displacement Metric = 



(5) 



TABLE n 

The Properties of the Disk Substrate Materials 















SOeonOttM* 




7t 


100 


12» 


223 


400 


.430 




2860 


2760 


27DO 


6040 


3900 


3210 




0.30 


0^ 


0^ 


025 


0.25 


o.t& 



The displacement metric is used in this paper to predict 
and compare the performance of substrate materials having 
the same disk dimensions. The damping parameter in (5) 
is dependent on the material properties and the fluid dy- 
namics that provide turbulent excitation to the rotating disk. 
In a previous publication [5], the authors had described the 
evaluation of damping of alternate substrates using transient 
response method and Q-fector methods [4] and shown that the 
damping parameter determined from Q-fector method matched 
experimental results more closely. 

m. Experimental Methods 

Velocity spectrum measurements were made in three drives 
running at a constant speed of 10 000 rpm using a Polytec fiber- 
optic laser doppler vibrometer (LDV) with a laser beam passing 
through a slot in the top cover to measure velocity at the OD 
of the top disk. The disk substrates tested had an outer diam- 
eter of 84 mm and inner diameter of 25 nmi and thickness of 
0.8 mm (31.5 mil). Flatness was within 10 microns and sur- 
face roughness less than 100 A. Samples of aluminum, glass, 
glass ceramic, zirconia, alumina and silicon carbide disks were 
tested (Table II lists the properties). The average results from 
three tests were summarized for comparison purposes. 

The experiments were performed with the disk drive clamped 
to avoid any baseplate motion and to perform repeatable ex- 
periments. The laser beam was positioned using the microm- 
eters for the laser beam to reflect directly back into the lens 
probe. The lens probe was focussed for good signal to noise 
ratio. The velocity output from the \^brometer controller was 
provided as an input to a Hewlett-Packard (HP) 35 670 dynamic 
signal analyzer. We used the highest sensitivity, 25 mm/s/V, of 
the controller with the measured signal without saturation and 
used a 100 kHz-tracking filter. The HP analyzer computed the 
power spectrum of velocity in (volts rms) with a flattop window 
with 50 averages, for a span of 0-3200 Hz. The disk harmonics 
(166.67*n Hz for lOOOO rpm) were removed to focus our at- 
tention on the disk resonant modes alone. The displacement 
spectrum was determined by integrating the velocity data and 
removing the low frequency terms associated with integration 
noise. The axial displacements were converted to radial track 
misregistration units using a modal analysis study of the disks. 
As Fig. I shows, the track misregistration is proportional to 
the slope of the disk at a given radius. The FEM analysis de- 
termined the slope/displacement ratio to be 0.051. The fector 
[/ = 0.051* (Disk thickness + 0.5*Slider thickness)] was mul- 
tiplied by the axial displacement in order to obtain the radial 
TMR. The radial displacement was expressed in percentage data 
track width by dividing by data track width (m) and multiplying 
by 100. For this study we chose a TPI of 25 000 that translates 
to a track width of 1.0l6e-06 m. The power spectral density 
(m2/Hz) was determined from the displacement spectra and the 
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Fig. 1. Computing track niisicgistration (TMR) from axial displacement ( : ) 
at a given radius (r). 
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Fig. 2. Low (800 Hz) and high (1600 Hz) servo bandwidth model servo 
sensitivity curves show different responses of the servo system to disk vibration. 

total area under the trace represents the variance, cr^, of the TMR 
[2]. The standard deviation given by the square root of the vari- 
ance, and three standard deviations represents the 0 to peak vi- 
bration assuming a Gaussian distribution. The peak cumulative 
3cr TMR was determined for each substrate which represents 
the open-loop contribution of disk vibration to TMR. 

IV. Closed Loop TMR Prediction for Alternate 
Substrates 

During read and write seek operations in a disk drive, disk 
vibration is one of the sources of disturbances that cause posi- 
tion error due to track misregistration (TMR). Actuator torque 
disturbance, actuator dynamics, measurement noise, bearing de- 
fects and spindle harmonics influence this TMR also. A simple 
model of the control system was used to generate the servo sen- 
sitivity function magnitude, which affected how disk vibration 
and other position disturbances influence the position error. The 
plant model used was * .s^) and a compensation model 
K * {^r/vi^ + 2 * ^/w * .s + l)/.s, resulted in a stable realizable 
loop transfer function with the genoal shape of a realistic con- 
trol system. The servo bandwidth and phase margin are typically 
limited by sampling effects, the system dynamics (resonances), 
and the desire for robustness to variations across heads in a given 
drives and across drives in a mass production run of a given drive 
type. We varied the bandwidth or the open-loop crossover fre- 
quency from 800-1600 Hz to understand the TMR contribution 
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Fig. 3. The resonant modes of disk vibration for an 84 nun OD and 0.8 nun 
thick aluminum disk spiiming at 10 krpm. 




o flor- looo IK© jnoc jsgc ^rco 35co 



Fig. 4. The disk vibration spectra (m displacement units) and the cumulative 
TMR without servo weighting (6.3% data track width) for ahiminum disk. 

from different disk substrate materials in a closed-loop system. 
We assumed a desired phase margin of 30° (usually between 
10 and 90) and the zero damping, C = 0.7 (typically 0.5-4), 
The zero damping affected servo speed of response and the low 
frequency rejection characteristics of the sensitivity function. 
The low frequency gain K of the compensator was adjusted to 
achieve the desired loop crossover. 

The servo sensitivity function is plotted for a servo bandwidth 
of 800 and 1600 Hz in Fig. 2. The low (800 Hz) bandwidth sensi- 
tivity function attenuates disturbances like disk vibration below 
500 Hz while a better servo design with a servo bandwidth of 
1600 Hz can attenuate upto about 1 100 Hz. It is also important 
to note that the sensitivity function amplifies disk vibration in 
certain fi^uency ranges. In this paper the displacement spectra 
for aluminum and alternate substrate materials are weighted by 
servo sensitivity functions of varying servo bandwidths and the 
closed loop TMR is computed for all the cases. 

V. Results and Discussions 

The measurement methods discussed above were used to 
compare the performance of aluminum and other alternate sub- 
strate materials. Fig. 3 shows the velocity spectrum measured 
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TABLE ni 

Comparison of Various Substrate Materials: Higher the Displacement Metric, Lower the Track Misregistration 
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Fig. 5. The reciprocal of the displacement roctric is a good predictor of TMR 
due to disk vibradon. 

by the LDV for an 84 mm outer diameter (OD) and 0.8 mm 
thick aluminum disk. 

The resonance modes have been identified on the spectrum: 
the umbrella mode (0,0) and the split harmonics, due to a spin- 
ning disk [2]. 

A. DUiplacement Metric as a Prediction Tool for Open Loop 
TMR 

The total area under the power spectral density (m2/Hz) was 
determined to compute the variance and the peak cumulative 
3cr TMR. Fig. 4 shows the cumulative TMR of 6.2% data track 
width (for 25 kTPI) for the aluminum substrate tested at 1 0 000 
rpm. 

Similarly the disk vibration spectra (refer to the Appendix, 
Appendix) and cumulative TMR metric for the alternate sub- 
strate materials were determined (Table III). 

The Q-factor corresponding to the fiindamental mode (0,0) of 
vibration was used to estimate the damping ratio for each sub- 
strate material [5]. Table HI lists the damping ratio determined 
in this manner for the substrates. The highest damping ratio was 
for glass ceramic substrate. The di^lacement metric was com- 
puted [from (5)] for all the substrates. Table in also includes 
the specific stiffiiess of the substrates. Table HI clearly indi- 
cates that the displacement metric is lower for substrates with 
higher TMR. However specific stifBiess is not a clear predictor 
of TMR. The specific stifBiess of Zirconia is lower than that of 
glass ceramic yet its TMR is lower. The displacement metric, 
which is not dependent on the density [as in (5)], is able to pre- 
dict this trend better. 




Fig. 6. The disk vibration of aluminum disks weighted by a servo- sensitivity 
fiinction of 1600 Hz bandwidth and the resulting cumulative closed-loop TMR. 
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Fig. 7. The closed loop TMR metric results for the alternate substrates as a 
function of servo bandwidth. 

Fig. 5 helps visualize the predictive nature of displacement 
metric. The reciprocal of the displacement metric multiplied by 
a constant is a good predictor of the cumulative TMR metric for 
all the substrates tested. 

This paper identifies two approaches to reduce disk vibration 
based on a theoretical model. For disks of the same dimensions, 
the displacement metric is a good predictor of the TMR due to 
disk vibration for alternate disk substrate materials. 

B. Prediction of Closed Loop TMR 

The displacement spectra for aluminum and alternate sub- 
strate materials were weighted by servo sensitivity functions of 
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Fig. 8. Comparison of the displacement spectra of ahnninum, glass, glass ceramic, zirconia, alumina and silicon carbide disk substrate materials. 



varying servo bandwidths and the closed loop TMR was com- 
puted for all the cases. The low servo bandwidth (800 Hz) sensi- 
tivity function attenuates disturbances like disk vibration below 
500 Hz while a servo design with a servo bandwidth of 1600 Hz 
can attenuate upto about 1 1 00 Hz. It is also important to note that 
the sensitivity function also amplifies the disk vibration in cer- 



tain frequency ranges. The displacement spectra for aluminum 
substrate weighted by a servo sensitivity function with a band- 
width of 1600 Hz and the closed-loop TMR metric is shov«i 
in Fig. 6. The weighted spectra shows reduced vibration ampli- 
tudes for resonant modes below 1 100 Hz but amplified vibra- 
tion above 1 100 Hz. The closed loop TMR metric of 4.0% data 
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track is difiFerent from the open-loop TMR of 6.3% data track, determines the alternate substrate that may be needed for a drive 

demonstrating that it is important to understand closed-loop per- program for a required operating TMR. While previous p^>ers 

formance. [2], [3] have not addressed this issue, this is increasingly impor- 

For a low servo bandwidth of 800 Hz, there is a large differ-- ' tant as disk vibratidh is one of the largest contributors to TMR. 
ence between aluminum and the alternate substrates, and as the This work marks an important step in the development of 

servo bandwidth increases to 1 600 Hz this difference is less sig- new disk substrate materials and predicting TMR due to disk 

nificant (Fig. 7). vibration for higher performance disk drives. 



VI. Conclusion 

In this paper we identified a quantitative measure called 
"displacement metric" that predicted closely the TMR due to 
disk vibration in a 1 0000 rpm disk drive for aluminum, glass, 
glass ceramic, zirconia, alumina and silicon carbide disk sub- 
strates. The displacement metric depends on substrate material 
properties of Young's Modulus, damping and Poisson's ratio 
and is a better predictor of TMR than traditional measures 
like specific stiffiiess and is currently being used by substrate 
vendors to improve the performance of new alternate substrate 
materials. Laser Doppler Vibrometer (LDV) measurements 
in a lOOOO-rpm disk showed that the cumulative 3-sigma 
TMR metric due to disk vibration was proportional to the 
reciprocal of the "displacement metric." The TMR metric 
without servo-sensitivity was found to be least for Alumina. 

Closed loop TMR metric due to disk vibration was computed 
and compared for a range of servo bandwidths (800-1600 Hz). 
For a low servo bandwidth of 800 Hz, there is a large differ- 
ence between aluminum and the alternate substrates, and as the 
servo bandwidth increases to 1600 Hz this difference becomes 
less. The results indicate that the servo compensator design also 



APPENDIX 

Comparison of the displacement spectra of aluminum, glass, 
glass ceramic, zirconia, alumina and silicon carbide disk sub- 
strate materials. 
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Hierarchical Mass Storage Systems 
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Abstract— Mass storage systems are finding greater use in sdentifk; computir>g research environments for retrieving and archiving 
the large volumes of data ger^erated and manipulated by sctentific computations. This paper presents a queuing network model that 
can be used to carry out capacity planning studies of hierarchical mass storage systems. Measurements taken on a Unitree mass 
storage system and a detailed workload characterization provided the workload intensity and resource demand parameters for the 
various types of read and write requests. The perfomnance model developed here is based on approximatiorts to multiclass Mean 
Value Analysis of queuing networks. The approximations were validated through the use of discrete event simulation and the 
complete model was validated and calibrated through measurements. The resulting model was used to analyze three different 
scenarios: effect of workload intensity irxrease, use of file compression at the server and client, and use of file abstractions. 

Index Terms— Mass storage systems, queuir>g network modeling. nf)ean^alue analysis, Unitree central file manager, compression, 
file abstraction. 



1 Introduction 

MASS Storage systems are finding greater use in scien- 
tific computing research environments for retrieving 
and archiving data generated by model simulations in vol- 
umes on the order of terabytes. This demand on the mass 
storage systems is increasing at rates faster than the cur- 
rently operating mass storage systems can efficiently han- 
dle [1], |2], [3]. To meet these demands, some computing 
centers are procuring additional storage devices without 
access to the tools for predicting the performance of the 
expected workloads on existing and new mass storage sys- 
tem configurations. Performance models are necessary to 
carry out adequate capacity planning studies for mass stor- 
age systems. Queuing network models are a viable alterna- 
tive if accurate approximations can be found to deal with 
the features of mass storage systems which cannot be dealt 
with by exact models. Queuing network models can be 
used to provide average file storage and retrieval times and 
system throughput as a function of various parameters in- 
cluding file sizes, workload intensity, performance charac- 
teristics of the various physical storage devices that com- 
pose the mass storage system, and the architecture of the 
mass storage system hierarchy. 

This paper describes the development of a queuing net- 
work (QN) model to assess the performance of a hierarchical 
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mass storage system. The model was validated on the Uni- 
tree Central File Manager, used at NASA's Center for 
Computational Sciences (NCCS). The system being mod- 
eled consists of a large number of workstations connected 
to a single storage server via an Ethernet network and a 
Cray supercomputer connected to that same storage server 
via a high speed Ultranet network. The storage server is a 
UNIX based multiprocessor that manages the devices 
which comprise the hierarchy of the mass storage system. 
The Unitree Central File Manager (UCFM) is an application 
which runs on top of UNIX and manages the file systems at 
each level of the hierarchy, as well as the flow of files ft-om 
one level of the hierarchy to another. The granularity of 
access to the data by Unitree is at the file level. At the par- 
ticular installation where the modeling was performed, 
users access the mass storage system through the ftp proto- 
col connecting to the central server at a specific port. Thus, 
requests for storing and retrieving files arrive in the form of 
put and get commands, respectively. Even though the ex- 
ample used in this paper is based on the UCFM. the tech- 
niques presented here can be used to model other mass 
storage systems that adhere to the IEEE Mass Storage Sys- 
tem Reference Model [4]. 

Mass storage systems of this magnitude of storage ca- 
pacity have not been available until recently, so little other 
work has been done to evaluate their performsince. 
Ramakrishnan and Emer [5] developed a closed queuing 
network model to evaluate service alternatives for distrib- 
uted mass storage systems. The performance of providing 
mass storage service at the level of a block versus a logical 
file is compared using the queuing network model. The 
definition of a mass storage system in [5] differs ft-om the 
one considered here, since they refer to a disk based storage 
system without the additional complexity of traffic moving 
between the various layers. Drakopoulos and Merges [61. 
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[7] use a closed queuing network nnodel to evaluate the 
performance of a hierarchical mass storage system for vari- 
ous file movement criteria, and to study the trade-offs be- 
tween recalling a file and accessing it by a distributed file 
system on various netvvbrk architectures/ In [6], [7] the 
authors focus on the network interface of the mass storage 
system and not on the internal interaction between the lay- 
ers. We develop here an approximate closed queuing net- 
work model of Unitree. Our focus is on the f)erformance 
evaluation of the interactive traffic arriving over the net- 
work in the form of ftp get and put requests. In an earlier 
study |8], the authors developed a trace-driven simulation 
of the NCCS site and concluded that the disk cache hit ratio 
is very small (30 percent-40 f)ercent) and that users tend to 
reference either files that were just created or those which 
were created a long time ago (over three months). In fact, 
the simulation showed that, even if the disk cache were 
large enough to hold all references in the previous three 
months, the hit ratio would still remain within the 30 per- 
cent-40 percent range. Therefore, we are not including in 
the model the effect of file migration policies. 

The rest of this paper is organized as follows. Section 2 
describes in more detail the Unitree Central File Manager 
and its underlying hardware and software characteristics. 
Section 3 describes the workload imposed on the system 
which was used in our experiments. Section 4 describes the 
analytic model used. Section 5 discusses the validation of 
the model and several numerical results. Finally, Section 6 
provides some concluding remarks. 

2 Overview 

In this section, we describe briefly the functionality of the 
UCFM and present its mean hardware and software char- 
acteristics included in the model. 

2.1 The Unitree Mass Storage Systenn 

The UCFM is a hierarchical distributed file system which 
runs as an application on top of UNIX. UCFM is a mass 
storage manager which provides a transparent uniform 
UNIX-like file system to the user. The first layer of the hier- 
archy consists of a pool of 75 striped magnetic disks with a 
total capacity of 155 GB, which behaves as a file cache for 
the overall system. Four robotic tape storage silos, with a 
capacity of 4.8 TB each, comprise the second layer. The third 
layer is comprised of free-standing tape storage. 

The data stored on UCFM can be accessed from any local 
machine using either the FTP protocol or the NFS protocol. 
For performance reasons, only the FTP protocol method is 
used at NCCS. When files are first transferred to UCFM they 
are stored on the first layer of the hierarchy. Then, through a 
process called migration, a copy of the file is made available 
to a lower layer of the hierarchy. Based on certain configur- 
able parameters, files from the highest layer are removed if 
they have not been accessed for a certain f)eriod of time. 
When the user tries to recall a file, UCFM retrieves the file 
from the highest layer on which it is located. 

UCFM is composed of a number of servers running on 
the CONVEX machine that manages the storage hierarchy. 
Following the IEEE MSSRM, each server is responsible for 



one specific task. This distribution of responsibility and the 
functional separation of the components allows for load 
distribution, enhances the scalability of the storage system, 
and provides for more fault tolerance. Fig. 1 shows a dia- 
gram of the UCFM servers and their interrelation. 
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Fig. 1 . UCFM system architecture. 



A brief description of each of the servers in the figure 
follows. 

Name Server Maintains the Unitree file system structure 
and provides a transparent, UNIX- like interface to the 
Mass Storage System. It resolves human -oriented names 
to globally unique machine-oriented resource identifiers 
(bitfile ids). The Name Server maintains an on-disk da- 
tabase of name to bitfile id mappings, as well as an in- 
memory cache of recently resolved mappings. The Name 
Server also authenticates access rights of the requester. 

Disk Server: Provides the logical means for storing and re- 
trieving data from the disk cache. It maintains the infor- 
mation necessary for mapping a bitfile id into the actual 
file stored on the disk. 

Disk Mover: Manages the transfer of file data to and from 
the disk cache. All such requests originate ft*om the Disk 
Server. A response to each request is sent directly to the 
recipient of the file rather than to the Disk Server. 

Tape Server Performs the equivalent service to tapes as the 
Disk Server f>erforms to the disk cache. Its objective is to 
maximize the use of the storage media by archiving files. 
It maintains all information needed to retrieve files from 
tapes. It receives requests from the Disk Server and the 
Migration Server for access to files. 



PENTAKALOS ET AL.: ANALYTICAL PERFORMANCE MODELING OF HIERARCHICAL MASS STORAGE SYSTEMS 



1105 



Tape Mover Manages the transfer of file data to and from 
tapes. It receives all its requests from the Tape Server. 

Physical Device Manager: Manages the tape mounts and per- 
forms the mapping between bitfile ids and tape ids. It re- 
ceives requests to mount tapes from the Tape Server and 
communicates its requests to the Physical Volume Re- 
pository to mount and dismount tapes. 

Physical Volume Repository: Maintains the information about 
the location of each tape and every storage device avail- 
able at the tape level. It receives requests to mount tapes 
from the Physical Device Manager and issues mount 
commands to the robot or of)erator. 

Migration Se/ver: Moves data from the disk cache to lower 
levels of storage in the hierarchy in order to increase the 
amount of free space on the disk cache. 

Repacking and Vaulting Server: Removes the fragmentation 
from the tapes in the silos and performs the migration of 
files from the silos to the off-line tape drives. 

UCFM performs four major tasks to manage the overall 
mass storage system. The first task is the processing of user 
requests for file storage and retrieval. Files which are stored 
into the Unitree system are always placed on the disk cache 
first. When the file to be retrieved is located on the disk 
cache, the Disk Mover is used to transmit the file to the 
user. If the file is not in the disk cache, it has to be brought 
into the disk cache from either online or off-line tape and 
then transmitted to the user. The second task is the migra- 
tion of files from the disk cache to the silos, using various 
criteria for selecting the files to migrate. The criteria are that 
the files must reside on the disk cache for a certain amount 
of time before they can migrate, and there should be a cer- 
tain number of files ready to be migrated before migration 
starts. The third task is tape repacking, which is used for 
removing the fragmentation from tapes and for increasing 
the number of free tapes at each silo. When an updated file 
is brought into the disk cache, the old copy of the file which 
possibly resided in tapes is invalidated, causing fragmenta- 
tion in the tapes. Repacking starts when the number of free 
tapes at a sUo falls below a configurable parameter and 
tapes which have a certain f>ercentage of fragmentation are 
selected for repacking. When a tape is repacked, the data is 
read from the tape into the disk cache, and then stored in 
an empty tape. The last major task is file vaulting, which is 
the migration of files from robotic tapes to off-line tapes. 
Vaulting is performed periodically and also when repack- 
ing cannot produce the desired number of free tapes. The 
files are selected for vaulting using a configurable file aging 
parameter. 

2.2 Main Hardware and Software Characteristics 

The large number of devices attached to the system and 
their shared use by several software components creates 
contention at various points of the hardware architecture. 
To ensure that the model captured that contention, it be- 
came necessary to analyze various components of the 
hardware architecture at a low level. 

The UCFM software runs as an application on a Convex 
C3830 supercomputer with three independent CPUs. Using 
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Fig. 2. I/O device connectivity. 



TABLE 1 
Disk Characteristics 



Disk 
Type 


Transfer 
Rate (MB/s) 


Average 
Seek (ms) 


Average 
Latency (ms) 


1 


6,00 


16.0 


8.33 


2 


4.67 


12.0 


6.87 


3 


9.34 


12.0 


6.87 


4 


9.34 


11.5 


5.55 



a dynamic scheduling scheme, caDed automatic self- 
allocating processors. CPUs are assigned to processes and 
threads using both hardware and software interaction. Each 
CPU is responsible for scheduling itself to an executing 
process. Each thread of control posts the need for a CPU, 
and an idle CPU selects the thread and executes it. 

Fig. 2 shows the interconnection of controllers and stor- 
age devices with the I/O bus of the system. There are 128 
disks attached to the system, 75 of which are used by the 
disk cache as striped devices. Striping will be described in a 
later section. As shown in Fig. 2, there are four Integrated 
Disk Channels (IDC), plugged into the main I/O bus of the 
server, which control the disks. The IDC consists of a Mo- 
torola 88100 RISC processor, separate code and data RAM, 
and four independent peripheral interface ports. Each pe- 
ripheral interface port conforms to the ANSI Intelligent Pe- 
ripheral Interface (IPI) physical interface specification and 
is capable of transferring data at the rate of 10 megabytes 
per second, concurrently with the other ports on the same 
IDC. Following the IPI standard, each IDC port has eight 
disks attached in a daisy chain fashion. Disks closer to the 
IDC port have higher priority over disks farther away in the 
chain. To ensure that this implicit priority assignment does 
not hinder performance, the faster disks were placed closer 
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to the IDC port and slower disks were placed farther away. 
Table 1 describes the characteristics of the four types of 
disks attached to the interface. 

The disk cache consists of 75 disks, distributed among 15 
logical striped - devices of five physical disks per striped 
device. The five disks of each striped device were selected 
to be on different controller ports, thus providing the 
maximum possible concurrency to of>erations on a strif)ed 
device. Striping is implemented by device drivers within 
the kernel and uses a Block Interleaved Distributed Parity 
(Raid-5) disk array [9]. Under this method, one file system 
block is interieaved across multiple disks. The fraction of 
the block size determined by the interleaving is referred to 
as the stripe unit. In this particular case, one block is stored 
across four disks. Redundancy is provided by computing 
the parity for each of the four stripe units, each of which 
resides on a different disk, and storing the result on the fifth 
disk of the striped device. The parity block is also distrib- 
uted on different disks to provide maximum concurrency 
on read requests. This allows all five disks to participate on 
read requests. The disadvantage of RAID-5 stripping is the 
read-modify-write operations that occur in small writes. A 
write operation of one stripe unit in size requires five reads 
to get the information so that the new parity block can be 
computed, computation of the new parity, and two write 
operations to update the data block that was modified and 
to store the new parity result. 

The second and third levels of the storage hierarchy con- 
sist of four robotic tape storage libraries (silos) and four 
free-standing tape drives. The silos manage the storage of 
6,CMX) tapes each, providing fast robotically controlled 
mounting. In this specific configuration of the silos, there 
are six tape drives available for sei vicing read and write 
requests. All tape drives, both the ones within the silos and 
the free-standing ones, have the same physical characteris- 
tics. A SUN workstation is connected to both the CONVEX 
and the silos and maintains, for each tape, its location 
(robotic silo, off-line tape drive, off-line storage) and its 
status (mounted, stored). Requests for mounting tapes and 
positioning them for a read or write operation are sent to 
the silo by the SUN workstation. The tape drives are con- 
nected to the CONVEX through the Tape Library Interfaces 
(TLI) as shown in Fig. 2. Each TU provides two independ- 
ent data paths to the six tape drives inside each silo. 

3 Workload Characterization 

The data source for workload characterization was the log 
of ftp get and put requests. We were interested in evaluating 
the performance of the Unitree during periods of time 
when the interactive load was heavy. To determine the pe- 
riods of heavy Unitree usage, a histogram of the get and put 
requests was generated for each day for a period of 10 days. 
Fig. 3 shows two typical histograms of interactive requests, 
one for get and one for put requests. Consistently, through 
all histograms, the interactive load on the system achieved 
its peak between 9:00 a.m. and 6:00 p.m. for all histograms 
considered. This focused our workload characterization to 
the requests arriving during that period of time. 

The file size of both types of requests varied from small 
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Fig. 3. Histogram of interactive requests. 



files of a few kilobytes to huge files of a few hundred 
megabytes- Ignoring this variation in file size would intro- 
duce an error in the model, thus, the requests were sepa- 
rated into multiple classes with different file sizes. To de- 
termine the appropriate number of classes for each type of 
request and the file size for each class, a clustering analysis 
on measurement data was performed. We used the /c-means 
algorithm for various values of k [10], [11]. The algorithm 
was initialized using k values uniformly distributed over 
the range of file sizes and iterated until there was no more 
shifting of points between clusters. The data for 10 days of 
get and put requests was used for the characterization. 
Separate data sets were generated for each day and for each 
of the two types of requests. The algorithm converged after 
five repetitions, on the average, for each data set used in 
our experiments. The cluster centroids provide the file sizes 
which determine the workload classes and the membership 
of points within a cluster determine the fraction of all re- 
quests for a type of that specific size. 

As the number of clusters k increases, a better clustering 
of the points can be determined. On the other hand, as the 
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number of clusters increases, the computation time for 
solving the queuing network also increases, since the num- 
ber of clusters determines the number of classes in the 
queuing network. In order to find a value of k which both 
attains a good clustering of the points and, at the same 
time, allows for efficient solution of the queuing network, a 
tightness measure was used [11]. Smaller values for the 
tightness imply a better clustering of the data, since the 
centroids are chosen to be closer to all the points within 
their corresponding cluster. Even though, in the limit as k 
approaches N, the tightness goes to zero, the decrease in the 
tightness is not monotonic. A locally minimal value of the 
tightness was observed for A = 4 for both get and put re- 
quests. Table 2 describes, the workload selected to drive the 
model, showing the file size of each class selected and the 
frequency of occurrence of each class out of a total of 3.691 
requests during the measurement period. The notation 
used to refer to the classes for the rest of the paper is gi 
through g4 for the four get request classes and pi through 
for the four put request classes. 



TABLE 2 
Workload Characteristics 



Class 


File Size 
in MB 


Frequency of 
Occunrence 


9^ 


1.2 


33.8% 


92 


19.6 


9.9% 


93 


78,9 


4.2% 


94 


220.6 


1.4% 


Pi 


' 1.7 


42.3% 


P2 


34.8 


3.3% 


P3 


77.7 


3.9% 


Pa 


144.1 


1.2% 



4 The Model 

4.1 Request Sequence 

The objective of this study was to model the flow of the 
interactive requests through the various devices in the sys- 
tem. The fraction of time spent at each device of the system 
by each class determines the load demand of that class for 
that specific device. Some details about the functionality of 
the UCFM are needed here. The Name Server maintains a 
database of name to resource id mappings into two separate 
disks» both of which are not used by the disk cache and are 
not strifjed devices. Also, it maintains an in-memoiy cache of 
recently resolved mappings. One of the two disks serves as 
the primary database and the other serves as the secondary 
database, for fault tolerance reasons. The disk server main- 
tains the necessary information for retrieving a file from the 
disk cache in memory. Also, all the headers of files stored in 
tape storage are stored on a few disks, which are. again, not 
used by the disk cache and are not striped devices. 

The flow of both get and put requests through the sys- 
tem is shown below. 

Get request processing sequence: 

1) Use of the CPU by the ftp daemon to establish the 
connection. 



2) Let p„ be the probability that name resolution will be 
done by the name server*s cache. With probability p^. 
use the CPU only to do name resolution. With prob- 
ability (1 - pj. use the name server's disk partition to 
resolVe the nan^. 

3) Use of the CPU by the disk server to search the search 
table for the file. 

4) With probability p^. the file resides in the disk cache. 
So, use the striped device to retrieve the file from the 
disk cache. 

5) With probability (1 - pj, the file is stored on tape. Let 
Pts be the probability that the file's header information 
is stored in the in-memory taf>e header cache. With 
probability p^^, use the CPU to obtain the file's loca- 
tion in tape storage. 

6) With probability (1 - pj. use the tape header parti- 
tions on the disk to locate the file's location in tape 
storage. 

7) With probability p^^^ the file is retrieved from robotic 
tape storage into the disk cache. 

8) With probability (1 - p^jj, the file is retrieved from 
off-line tape storage into the disk cache. 

9) Use of the CPU to transfer the file from the disk cache 
to the user over the network. 

Put request processing sequence: 

1) Use of the CPU by the ftp daemon to establish the 
connection. 

2) As with get requests, with probability p^. use the 
CPU to resolve the name ft*om the in-memory cache. 
Else, with probability (1 - p^J, resolve the name from 
the name server's disk partition. 

3) Use of the CPU to update the disk server's header in- 
formation for the new file created or for the file to be 
updated. 

4) Use of the CPU and disk to transfer the information to 
the disk cache. 

4.2 The Queuing Network Model 

Fig. 4 is a diagram of the queuing network model. Each of 
the major components of the model are enclosed within 
dotted boxes. Starting ft-om the left, the circle labeled "User 
WS" is a delay server, which represents the time interval 
between read or write requests arriving ft-om the user 
workstations. The next component is the CPU unit, which is 
represented by a single queue and three servers. As de- 
scribed in Section 2.2. the three CPUs are capable of sym- 
metric multiprocessing, so they can be accurately modeled 
as three independent servers. The "Disk Cache" component 
represents all the striped disks that form the Unitree's disk 
cache. The "Disks" component represents the two disks 
used for storing the name server database, and the five 
disks used for storing the tape server search table and free 
space map. These disks are modeled differently from the 
disk cache's disks, since they are not striped devices. Fi- 
nally, the "Tape Devices" component represents the robotic 
silo tape drives and the off-line tape drives. The next few 
paragraphs describe the "Disk Component" and "Tape De- 
vices" component in more detail. 
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Fig. 4. Queuing network model diagram. 



As described in Section 2.2. each disk is connected to the 
CONVEX through one of the four ports on an IDC control- 
ler, each port having eight disks connected to it. Since each 
of the four ports can operate concurrently with one another, 
the contention is at the individual port and not at the IDC 
controller itself. Thus, we model each disk with a single 
queue, since the only queuing delay is at the individual 
disk for servicing the requests. Using the IPI interface speci- 
fication, for each port, a request is sent through the bus to 
one of the eight disks. The bus is then released until the 
disk is ready to transfer the data. Only when the disk has 
completed its seek and latency does it request the bus so 
that it can transfer the information. The bus is allocated to 
one of the disks, based on the priority assignment described 
in Section 2.2. A process-based simulation using CSim |12] 
was developed to model the exact operation of an IPI port 
to determine if modifications to the standard Mean Value 
Analysis (MVA) equations were necessary to model the 
priority. A system with a CPU, a user workstation, and a 



bus with eight disks was simulated. The performance pa- 
rameters of the bus and of the disks used in the simulation 
were the same as those of the actual hardweire at the NCCS 
site. The results of the simulation were compared with the 
results of the standard MVA equations, as shown' iifi Table 3. 
The table shows residence time at the disks and the 
total response time Rtoigj for both the analytic and simula- 
tion models for various values of the degree of multipro- 
gramming. The total response time for the simulation also 
includes 95 percent confidence intervals. The last column 
indicates the percent error between the residence time at 
the disks values obtained with the two models. As can be 
seen ftx)m the table, the error is very small, so it was not 
necessary to account for bus priorities in modeling the IPI 
disks. 

TABLE 3 
IPI Port Priority Modeling 





Simulatk)n 


Ana 


ytic 




MPG 










% Error 1 


1 


16.36 


21 .35 ±0.45 


16.00 


21.20 


2.20% 1 


2 


17.27 


22.34 ± 0.59 


17.54 


22.79 


1.56% 


3 


19.03 


24.05 ± 0.43 


19.13 


24.43 


0.52% 


4 


20.72 


25.78 ± 0.50 


20.77 


26.11 


0.24% 


5 


22.45 


27.69 ± 0.76 


22.25 


27.83 


1.60% 


6 


24.41 


29.55 ± 0.61 


24.17 


29.58 


0.98% 


7 


26.28 


31.52 ±0.73 


25.92 


31.36 


1.37% 


8 


27.82 


33.10 ± 0.82 


27.69 


33.16 


0.47% 


9 


29.46 


34.74 ± 0.89 


29.48 


34.99 


0.07% 


10 


32.35 


37.62 ± 0.84 


31.29 


36.83 


3.28% 1 



The 75 disks which comprise the disk cache are striped 
devices using RAID level 5, as described in Section 2.2. 
Each one of the service centers shown within the "Disk 
Cache" component represents the five disks which form 
each striped device. There has been a lot of recent work on 
the accurate modeling of the queuing and fork-join syn- 
chronization present in RAID level 5 disk arrays [9], |13], 
[14]. In order to accurately model the disk arrays in this 
paper, the standard MVA equations to compute the re- 
sponse time of each queue in the "Disk Cache" component 
had to be modified. The modifications made to the MVA 
equations and the validation of our approach are described 
in detail in Section 4.3. 

The "Tape Devices" component represents both the ro- 
botic tape drives and the off-line tape drives, since both are 
connected to the CONVEX I/O bus through the TLI con- 
trollers, as described in Section 2.2. A request for a read or a 
write to a tape drive, regardless of whether the tape is 
manually or robotically mounted, requires mount time, 
positioning time to place the heads at the correct position of 
the tape, and data transfer time. When a request arrives 
ft*om the Tape Server, the Physical Device Manager (PDM) 
issues a tape mount request. The Physical Volume Reposi- 
tory (PVR) determines whether the tape is located within 
the silo or stored on the shelf, selects a tape drive to mount 
the tape on, and issues the appropriate request. The queu- 
ing devices. 23-27 in Fig. 4. represent the robots that mount 
the tapes on the tape drives. Devices 23-26 model the robots 
in each of the four silos and device 27 models the human 
operator. Queuing devices a-f model the sbc tape drives 
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inside each of the four silos and queuing devices a-d. con- 
nected to device 27. model the four off-line tape drives. 

4.2.1 Model Parameters 

hx this section, we describe the parameters used to solve the 
queuing network model and the method used for collecting 
them. We also define the notation used for the models pa- 
rameters used for the remainder of this paper. The CPU 
service times were measured using both the standard UNIX 
utilities and the Unitree log files [15]. [16]. 

The "User WS" is a delay station and its service demand 
represents the mean interarrival time between the get or 
put requests, denoted by where r is the class of the job. 
The possible values for r can be gj through or pi through 
P4. The value of was measured from the FTP log file and 
the interarrival time was computed using the average be- 
tween arrivals of the same class for each of the classes con- 
sidered. Table 4 lists the computed interarrival time for 
each class of requests. 



TABLE 4 

Per Class Interarrival Times (in Seconds) 





9^ 


92 


93 


94 


Pi 


P2 


P3 


Pa 




30.0 


96.1 


246.7 


735.7 


23.3 


308.5 


246.7 


881.0 



The rest of the parameters are described using a trace 
through an execution scenario of a read request. When a 
request arrives, it requires some CPU time for interaction 
with the ftp daemon. This includes time for establishing the 
connection, authenticating the user, and parsing the re- 
quest. This component of the service time is denoted as 
tcpu/ipo- After the file has been located in the storage hierar- 
chy, it is transferred to the user via the ftp daemon, and this 
CPU service time is characterized by the parameters tcpu.np(a 
and tcpujipib- names must first be converted into resource 
identifiers before the system can process them. The Name 
Server maintains a database of file name to resource identi- 
fier records on disk and also maintains a small cache of re- 
cent requests in memory. We denote by the probability 
that the request will be resolved by the in-nnemory cache, 
^cpu.ns» the service time at the CPU by the Name 
Server for processing the request. Also, if the request is not 
resolved by the in-memory cache, denotes the number of 
visits to the name server's disk partitions for resolving the 
request. 

Using the resource identifier, the Disk Server attempts to 
locate the file in the disk cache and succeeds with probabil- 
ity Ph- In searching for the file, it consumes tcj^^b seconds of 
CPU time. If the file is not located in the disk cache, then, 
with probability the file is in robotic tape storage and, 
with probability (1 - p^, it is in off-line storage. The Tape 
Server determines the location of the file in tape storage by 
searching through its in-memory cache, and succeeds in 
finding it in the cache with probability pt^ or, by searching 
through the search table stored on the disk partitions, with 
probability (1 - pj- To retrieve the information from the 
disk, it visits the disks times. The PDM and the PVR 
servers then consume t^py^j^, CPU time for processing the 
mount request, which places the tape into a tape drive. The 
file is then transferred to the disk cache and, from there, to 



the user, over the network. 

A summary of the input parameters to the model follows: 

Zp interarrival time between requests of class r. 
^cpujifxr average CPU time.spent by the ftp daemon in estab- 
lishing the connection and other overhead. 
tcpujipta- average CPU time spent by the ftp daemon for set- 
ting up the transfer of a fUe. 
^cpujipttr^ average CPU time spent by the ftp daemon for 

transferring one 64KB block of data. 
^cjnijis' average CPU time spent by the name server daemon 

for resolving the filename into a resource identifier. 
Vf^: number of visits to the name server's disk partitions for 

resolving a filename into a resource identifier. 
Pns- probability that the filename to resource id mapping is 

in the in-memory cache rather than at one of the two 

disk partitions. 
P/,: probability that the file is located in the disk cache. 
^cpu,ds- average CPU time for locating the file in the Disk 

Server's search table. 
Pi^. probability that the file's header is stored in the Tape 

Server's in-memory cache of file headers. 
V(^: number of visits to the tape server *s disk partitions for 

retrieving the file's header. 
Pn^ probability that the file is stored in robotic storage rather 

than off-line storage. Measurements taken by the system 

administrators at NCCS show that p^j^, is equal to 0.8. 
^cpiunoimt' average CPU time spent by the PDM server and 

PVR servers for servicing a mount request. 
^cpiunovj^ average CPU time spent by the Tape Mover and 

Disk Mover in transferring one tape block from the tape 

to the disk cache. 
N^Qsi the number of robotic tape silos in the mass storage 

system. 

^ontdrv^ the number of on-line tape drives in each silo. 
^ofidn^ the number of free-standing off-line tape drives. 
^nnount- average CPU time for robotically mounting a tape. 
^bmount' average CPU time for manually mounting a tape. 
taperatef rate of transfer of data by the tape drives. For the 

taf>e drives considered here taperatef^ 2.5 MB/ sec. 
^isak- average tape seek time (equal to 1/3 of the maximum 

seek time [17]). 
blocksize^ block size used by the disks which is 16KB. 
biocksizei: block size used by the tapes which is 15KB. 

4.2.2 Computation of Sen/Ice Demands 
This section gives the equations used to compute the serv- 
ice demands at each device of the queuing model using the 
parameters described in the previous sections. The notation 
for the service demands used throughout the rest of this 
paper is also defined in this section. The service demand at 
the delay center D^^. for class r is equal to the time interval 
between arrivals for that class. The values for each class 
are shown in Table 4. 

The sCTvice demand equation at the CPU varies depend- 
ing on the request type. This happens because get and put 
requests follow a different path through the system during 
their service. For get requests, the service demand D^^ at 

the CPU for class gi is computed using the equation: 



1110 



IEEE TRANSACTIONS ON COMPUTERS. VOL. 46. NO. 10, OCTOBER 1997 



+ 1, 



+ t 



fUesize^ 



qju.ns 



blodsize, 



(1) 



This equation accounts for all the CPU time used to service a 
get request: The time spent at the ftp daemon, at the Name 
Server, at the Disk Server, at the PDM and PVR servers, and 
the time to transfer from the tape to the disk cache and from 
the disk cache to the client. The actual values depend on the 

file sizes for each of the classes, since t^p^^/ipb and t^uM»vr ^ 
service demands per block. For put requests, the service de- 
mand D^ p^ at the CPU for class p/ is computed using the 
equation: 



^qju.pj ^qju.llpo ^cpu.flpta ^qju.flptb 



filesize. 



blocksize. 



(2) 



Since files stored into the Unitree are always placed in the 
disk cache, the service demand equation for put requests is 
simpler. The total service demand accounts for CPU time 
spent at the ftp daemon, at the name server, at the disk 
server, and for transferring the file. Again, the file transfer 
time depends on the file size for the specific put class. 

The service demand equations for the striped devices 
also depend on the type of request For both get and put 
requests, the service demand for each striped device is a 
function of the service demand of each disk comprising the 
striped device. The notation used here is explained in detail 

in Section 4.3. The service demand Dj^^ of a get class job 

at physical device j. which is a component of striped device 
J, can be computed using the following equation: 



[seekj + latj}ti{g,) + 



blodisize^ 



trBtC: 



fi]esize„ 



blodisize^ 



.(3) 



where sedij is the average seek time, latj is the average la- 
tency. tratCj is the transfer rate of device j, and <t>{i) is a 
function that gives the number of seeks needed to sequen- 
tially access a file of size Rlesizef. The Unitree file system 
uses a varying block size aUocation method. On the first 
two requests for a block, a 64KB block is allocated. For 
every subsequent request after that, the size of the block 
allocated is doubled until it reaches a value of 4MB and 
remains fixed after that. We assume that a seek and a rota- 
tion are only needed for positioning the heads at the start of 
each of the varying size blocks, while access to data within 
the blocks is sequential. Using the size of the allocation 

units for the Unitree file system, is given in Table 5. We 
assume that the data is evenly distributed among the STR 

striped devices. The service demand D/^^ of a put class pi 

job at physical device j, which is a component of striped 
device i, is computed as 



Ph(2/577?) 



blodisize^ 



trate. 



fji€size„ 



blocksize^ 



(4) 



The reason why the multiplier is one for get classes and two 
for put classes is explained in Section 4.3. Table 1 shows the 
average seek, average latency, and transfer rate for each of 
the disk types. Once the service demand of each physical 
device has been computed, the service demand Df^^, on the 
logical striped device i by class gi of get requests is com- 
puted using the equation: 

^:8,="Agr (5) 

where = 1/i and D*^^ is the service demand of a 

class gi request on any of the physical disks which comprise 
striped device i. is used to estimate the expected maxi- 
mum of A independent exponentially distributed service 
times [18]. The service demand D^p^ by class of put re- 
quests is computed using the equation: 

D,,,=H,D[^. (6) 

Equations (5) and (6) are explained in detail in Section 4.3, 
along with the notation used in the equations. 

TABLE 5 

Average Number of Seeks per Class 



Class r 






6 


92 


11 


93 


26 


9a 


60 


Pi 


6 


P2 


15 




25 


Pa 


42 



Disks 16 and 17 in Fig. 4 are used by the Name Server for 
storing its database. We make the assumptions that Pns = 0 
and that, in order to resolve a filename, the name server 
needs to make two visits to one of the two disks. In an ear- 
lier study [8]. it was found that this mass storage system is 
used as an archive and files are retrieved after they have 
resided in the mass storage system for a long period of 
time. Also, once they are retrieved, they are not accessed 
again for a long time. This prevents the Name Server's 
cache from reducing the number of disk accesses. After a 
few hours of of>eration, the Name Server has in-cache en- 
tries for the top level directories, but it still needs to make 
one disk reference for the user's home directory resource id, 
plus another disk reference to get the user's file resource id. 
The equation for computing the service demand D^^ of class 
rat device i, where i e {16, 17}, is: 

Wodcsize^ 



1 



seekj + latj + - 



locksize^ V 



Pns). (7) 



It is multiplied by two because two visits are required per 
request and by 1/2 since we assume the entry is equally 
likely to be in either one of the two disks. 

Disks 18 and 19 are used by the Tape Server for storing 
its search tables and disks 20, 21, and 22 are used by the 
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Tape Server for storing file headers for each of the files 
stored on tape. Using the same reasoning as in the previous 
paragraph, we make the assumption that p^, = 0. We also 
make the assumption that the tape server needs to make 
one visit to one of the two search table disk partitions and 
one visit to one of the three header disk partitions for each 
request that it services. This is a reasonable assumption, 
since a hashing algorithm is used in memory to find the 
block on disk, where the file's header information must be 
within the search table disk partition, and, then, once that is 
known, the exact block where the header is located is also 
known. The service demand at disks 18 through 22 is zero 
for the write classes (pi, since files are written to the 

disk cache first. The equation for computing the service 
demand D^^. of class r (r = g,, • *, 54) at device i, where i e 
{18, 19}. is: 



seekj + iatj + ■ 



blocksize^ \ 



(8) 



and the equation for the load demand D, ^ for i e {20, 21, 22} 
andr = gi, —.g^is: 



seekj + latj + - 



blocksize A. 



(9) 



Again we make the assumption that the information is 
evenly distributed among the disks. 

The TLIs are modeled using two components, as ex- 
plained in Section 4.2. Devices 23 through 27 from the 
queuing network diagram in Fig. 4 represent the robotic 
and human mount servers. The service time for i g {23, 24, 
25, 26} is given by t^uni and the service time for i = 27 is 
given by f/imouni- Using these values for the service times, the 
service demands for class r = gj, g4 for devices i e 
{23, -",26} are given by: 



1 



^i.r = {^-Ph)jProt^. 



(10) 



and, for i = 27, by: 



^tr-(l-Ph)(l-PnJfh™,uni- (H) 

These service demands are all zero for classes pj, ■-, p4, 
since file puts go to the disk cache arid not to the tapes. We 
assume that accesses to robotic tape drives are evenly 
spread among the four silos and the six tape drives. This is 
a reasonable assumption since the Unitree attempts to per- 
form load balancing by spreading the load among all avail- 
able resources. FinaUy, the service demand at each of the 
tape drives within the "Tape Devices'* component is simply 
the amount of time it takes to seek to the position of the 
file's data within the mounted taf>e plus the amount of 
time to transfer the file from the tape to the disk cache. 
Thus, the service demand D^r for class r = gi, • - , at device 
ie {23a- f, 24a-/; 25a- /:26a- 5 is 



i.r ■ 



1 



blocksize, 



^silos^onldrv 

forie {27a, 27fe, 27c, 27£/} is 



tscck 



taperate, 



filesize^ 
biocksizei 



(12) 



(i-pJO-pJ 



oftdrv 



blocksize, 



^ taperBte^ 



fUesize, 



blocksize, 



(13) 



and is zero for the write classes since file puts are done on 
the disk cache. 

4.3 Modeling of Striped Devices 

Strif>ed devices cannot be modeled directly v^th the MVA 
equations since they exhibit fork-join synchronization, 
which does not satisfy product-form conditions; each re- 
quest is broken up into multiple independent requests 
which must all complete before the original request is com- 
plete. RAID level 5 presents the additional complexity that 
each write request of a small amount of data, relative to the 
stripe unit, results in a read-modify-write cycle. In the Uni- 
tree system, the block size used for reading and writing 
data is 64KB and the stripe unit is 16KB. Each logical 
striped device consists of four disks for data and one disk 
for the parity block, keeping in mind that the parity block 
rotates among the five disks. In the case of read requests for 
a single 64KB block, five indef)endent requests will be gen- 
erated, and the original request will not be served until 
both requests have completed. In the case of write requests 
to a 64KB block, five independent read requests are gener- 
ated to read the 64KB of data plus the parity block, the new 
parity is computed, and then five write requests are gener- 
ated to write the 64KB of data back plus the new parity 
block. Fig. 5 shows an example where stripe units 2 and 3 
have been modified. Five reads are generated, and, after 
they synchronize, the new parity is computed in the block 
labeled PC. Then, five stripe units are written, even though 
only stripe units 2, 3, and the parity P changed. 




Fig. 5. Processing of a one block write by the striped device. 

In order to accurately model the striped devices, we 
modified the equation for computing the response time in 
the MVA equations. We first introduce some notation. 

ND: Number of disks forming the logical striped device. In 
our case, ND is equal to five. For each stripe unit, ND-1 
disks contain data and the other contains the parity. 

K: Number of physical devices involved in satisfying a file 
system block request {K < ND). As described above, for 
get requests, iC = 2 and. for put requests. K-5. 

T>{. Set of physical disks j that form logical striped device i. 
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STR: Number of striped logical devices. In our case, STR is 
equal to 15. 

Rly. Residence time of logical striped device i for class r. 

Dly. Service demand of a class r request on any of the 
physical disks which comprise striped device i Since all 
the physical disks which form a striped device are iden- 
tical and, since we make the assumption that the data is 
evenly distributed among the disks, D]^ = Vj e D,. 

1^{N- \) : Average number of jobs at any of the physical 
devices which comprise device f. 

Every request to the striped device generates K inde- 
pendent requests. The time it takes to service the original 
request is equal to the maximum of the residence times of 
the K requests. We thus set the response time of the logical 
striped device to: 

The service demand. D^^^ of the logical device is a function 
of the service demand D*^ of any of the physical disks 
which comprise the striped device. For get classes, (5) is 
used for computing ^ for logical device i. For put classes, 

(6) is used for computing p . In both equations, we make 

the assumption that the data is distributed evenly among 
all striped devices, which explains the use of the factor 
I /STR in (3). In both get and put requests, five physical 
devices are involved in servicing one block request; thus, 
we used H5. Also, for write requests, since there must be 
one round of five read requests followed by a round of five 
write requests, we multiplied the overall service demand by 
two. 

A process-oriented simulation using CSim [12] was de- 
veloped to validate this approximation. A system with a 
CPU, a user workstation, and a five disk array was simu- 
lated, and its output was compared with the results of the 
modified MVA equations as shovwi in Table 6. The table 
shows the residence time and the total response time 
Rioiaj for both the analytic and simulation models for vari- 
ous values of the degree of multiprogramming. The total 
response time for the simulation also includes 95 percent 
confidence intervals. The last column indicates the percent 
error between the residence time at the RAID disk values 
obtained with the two models. 



TABLE 6 
RAID-5 Modeling 



MPG 


Simuiatk>n 


Ana 


ytic 


% Error 










1 


45.94 


115.81 ±1.72 


45.67 


115-67 


0.58% 


2 


51.85 


124.28 ±2.24 


54.13 


127.83 


4.39% 


3 


59.56 


136.94 ±3.03 


63-60 


141.46 


6.78% 


4 


68.30 


149.51 ±1.99 


73.92 


156-29 


8.22% 


5 


78-04 


164.84 ±1.12 


85.13 


172-42 


9.08% 


6 


88.82 


180.47 ±2.26 


96.77 


189-16 


8.87% 


7 


100.81 


197-36 ±1.54 


108.84 


206.51 


7.96% 


8 


112.62 


215.20 ±2.66 


121.24 


224.34 


7.65% 


9 


124.36 


230.90 ±2-14 


133.89 


242.53 


7.66% 


10 


137.49 


250.68 ±3.78 


146.75 


261.02 


6.73% 



As it can be seen from the table, the approximation pro- 
vides errors below the 10 percent level for all values of the 
degree of multiprogramming. 

5 Numerical Results 

This section presents the measured parameters for the 
model, discusses the model validation and calibration, and 
presents the analysis of three different scenarios. The first 
scenario explores the effect of workload intensity increase 
on file transfer time and throughput. The second discusses 
the advantages of using two different types of compression 
strategies, and the third analyzes the benefits of using file 
abstractions. 

Table 7 shows the actucil measured values for the pa- 
rameters used to compute the service demands at all de- 
vices. All the parameters were collected using the standard 
UNIX utilities and the Unitree log files. Using the equations 
for service demands for each device described in Sec- 
tion 4.2.2, the model was parameterized and solved using 
the modified MVA equations. Due to the large number of 
devices and classes, the approximate MVA algorithm was 
used, which converged in only three to five iterations. 



TABLE 7 
Measured Parameter Values 



Parameter 


Value (in sees} 


^cpu.ftpo 


0.007229 




0.00036 


^cpu.ftptb 


0.0000618 


^cpu,ns 


0.2070 


^cpu.ds 


0.6776 


^cf>u,mount 


0.506 


^cpu.movr 


0.005294 


^rmount 


79.6 


^hmount 


118.8 



Validation and calibration is a necessary step in model 
development [19]. Validation and calibration require that 
the system be measured so that the measured performance 
metrics can be compared with the ones obtained by the 
model. The first two calibration efforts were aimed at fine 
tuning the service demands at the disks and tapes as a 
function of the file size. The hit ratio, p/^ was first set to one 
to eliminate the tape subsystem. It was observed that the 
service demand for the small file classes of get requests (gi 
and was smaller than the measured value. Therefore, the 
value of the function <p{r) was increased to 12 and 16, respec- 
tively, to make the service demands match. Next, the hit ratio 
was made equal to zox) to introduce the tape subsystem and 
eliminate the disk cache for the get classes. This allowed us to 
calibrate the tape transfer rate parameter teperate,. This pa- 
rameter was calibrated at 1.48 MB/sec to make the transfer 
time computed by the model match the measured value. 
This calibration was later justified by indefjendent meas- 
urements taken by the system administrators at the NCCS 
site. Note that the calibrated value for the tape transfer rate 
is smaller than the manufacturer s advertised value. Finally, 
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TABLES 

Model Vaudation Results (T ransfer Times in Seconds) 







P/, = 0 


Class 


Mcas. 
Results 


ivicxjei 
Results 


Jo 

Diff. 


Meas. 
Results 


MOuei . 
Results 


0/ 

Diff. 




1.42 


1.42 


0.00% 


85.57 


92.20 


7.74% 


92 


6.35 


5.73 


9.76% 


107.10 


111.06 


3.70% 


93 


18.04 


18.04 


0.00% 


183.97 


171.56 


6.75% 


9t 


48.26 


48.27 


0.02% 


328.61 


316.14 


3.79% 


Pi 


1.78 


1.83 


2.81% 


2.02 


1.82 


9.41% 1 


P2 


10.16 


11.12 


9.45% 


10.60 


10.95 


3.30% 


P3 


21.60 


23.21 


7.45% 


21.74 


22.85 


5.11% 


Pa 


44.86 


42.07 


6.22% 


44.15 


41.41 


6.21% 



the technique of calibration by adjusting the multipro- 
gramming levels |19]. [20] was used to calibrate the model 
for low values of the workload intensity. The adjustments 
implied an upward change of at most two in the multipro- 
gramming level of three classes. As the workload intensity 
increased, such a calibration was not necessary. 

A synthetic workload that mirrors the real workload, 
characterized in Section 3, was developed and measure- 
ments were taken for the purpose of validating the model. 
This synthetic workload generates a large number of re- 
quests for each of the classes. Table 8 shows a comparison 
between measured and computed file transfer times for all 
classes and for the two extreme values of the hit ratio. All 
the measurements were taken during times of the day 
when the system was otherwise idle to reduce the effect of 
external requests on the requests generated by our synthetic 
workload. The second column in Table 8 lists the average of 
measured values for requests from files that were stored in 
the disk cache. Column 5 lists the average of measured val- 
ues for requests that were satisfied from the tape subsys- 
tem. Columns 3 and 6 list the values computed, with the 
calibrated model for hit ratios of one and zero, respectively. 
Finally, columns 4 and 7 show the percent difference be- 
tween the measured values and the results computed with 
the calibrated model. The largest errors were 9.76 percent 
and 9.41 percent, both of which occurred in small classes. 

The validated model is used in the following subsections 
to analyze three different scenarios. 

5.1 Effects of Workload Intensity Increase 

In a report by The Computer Environments and Research 
Requirements Committee, predictions are made on the com- 
puting requirements for 1997-2004 on the data storage and 
retrieval systems, for supporting NASA earth and ^>ace sci- 
ence research [21]. According to this report, it is expected 
that, in the next two years, the archival and retrieval rate on 
the UCFM will increase by a factor of five. In order to de- 
termine acceptable service levels for the eight classes in our 
workload, we conducted an e-maU survey of all users of the 
mass storage system at NCCS. We obtained resp>onses from 
17 percent of the 450 users who received the survey. The 
responses indicated the acceptable mean transfer time for 
each of the eight classes. 



To examine the performance of the mass storage system 
under an increased workload intensity scenario, we varied 
the load factor Mp defined as the maximum number of si- 
multaneous class r requests present in the mass storage 
system. The baseline load factor mix was determined using 
the frequency of occurrence of each class, shown in Table 2. 
The values used for the multiprogranuning level were 
M„ = 34, M„ = 10, = 4, =: 1, M„ = 42. M„ = 3, 

61 Si 83 St Pl P2 

= 4 , and M^^ = 1 . Figs. 6 and 7 show the variation of 
the average file transfer time versus the multiplier of the 
baseline load factor mix for classes and g4 in the first fig- 
ure and Pj and in the second figure, for a hit ratio of 0.3. 
As can be seen in the figures, the increase in transfer time, 
due to a five-fold increase in the load factor, will be 19 per- 
cent for class ^3, 28 percent for class ^4, 48 percent for class 
ft. and 47 percent for class p^. The smaller get and put 
classes are not as affected by the increase in the load factor. 
The coefficient of variation in the survey responses for get 
classes was higher than that for put classes. Therefore, we 
assumed as the acceptable value for the service level the 
mean plus two standard deviations in the case of get classes 
and just the mean in the case of put classes. The threshold 
for the get classes was derived as a compromise between 
the user community desired service levels and the con- 
straints imposed by the resources and budget of the exist- 
ing installation. For class gj. the service level of 174 seconds 
is violated when the load multiplier is 7.5. For class g^, the 
service level of 406 seconds is violated for a load multiplier 
of 8.5, as can be seen from Fig. 6. For class />j. the service 
level of 52 seconds is violated for a load multiplier of eight, 
while, for class P4. the service level of 80 seconds is violated 
for a load multiplier of seven, as shown in Fig. 7. 

Fig. 8 shows the transfer time of get classes as the hit ra- 
tio varies between 0.0 and 1.0, with the multiprogramming 
level set to that used to validate the model. The effect of the 
hit ratio on the transfer time depends on the service de- 
mands at the disk cache and tape subsystem for each class. 
As shown in Fig. 8, the transfer time decreases as the hit 
ratio increases. This can be better understood by looking at 
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Fig. 6, Transfer time (in secorKls) vs. load factor multiplier. 
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Fig, 7. Transfer time (in seconds) vs. load factor multiplier. 

Fig. 9» which displays the variation of the service demand 
for the disk cache and the tape system for class g4 as a func- 
tion of the hit ratio p^. Also shown in the figure, is the sum 
of the service demands for the disk cache and tape system. 
As can be seen, as the hit ratio increases, the overall service 
demand of the combined disk cache and tape system de- 
creases. This explains why the transfer times decrease as a 
function of the hit ratio. The same kind of behavior was 
observed for all classes. The factor by which the transfer 
time decreases is larger for the small file classes. For exam- 
ple, for class gj, the transfer time at hit ratio one is 66 times 
smaller than the same value at hit ratio zero, while, for class 
g4, the reduction factor is 6.6. The explanation for this efTect 
stems from the fact that, for the larger files (classes gj and 
54), the overhead of mounting a tape is amortized over a 



larger number of blocks than for the small file classes. Tape 
devices store data contiguously on tape and only exhibit a 
seek delay initially, when positioning the heads at the be- 
ginning of the file's data. This causes the effective transfer 
rates of tape drives to be comparable to those of disk de- 
vices, since disks exhibit per block seek and latency over- 
heads. Also, from this figure, we can infer that an increase 
in the hit ratio from the current measured value of 30 per- 
cent to 50 percent would decrease the transfer time of a 
class gj request by 25 percent and of a class request by 23 
percent. This provides incentive for research in prefetching 
techniques for caching. 

5.2 Analysis of File Compression 

We analyze here the effects of using file compression in two 
different manners: 

1) Client Compression: In this scenario, files are com- 
pressed and decompressed at the client |22]. Thus, be- 
fore storing a file into the mass storage system, the file 
is compressed at the client. Files are retrieved from 
the mass storage system in compressed form and de- 
compressed at the client. The model is modified as 
follows, to account for this type of scheme: All service 
demand equations that depend on file sizes have the 
file size reduced by a file compression ratio /^p which 
now becomes a parameter of the model. The com- 
pression and decompression times at the client are not 
taken into account when computing the transfer time 
under the client compression scenario. This was done 
because files are usually retrieved from the mass stor- 
age system in batch mode. Once retrieved by a client, 
they are used many times. So, the decompression time 
at the client is amortized over many uses of the file. 
Besides, different clients would give different values 
for the compression and decompression time. Use of 
compression is mainly geared toward increasing the 
effective storage capacity of the mass storage system 
as opposed to improving its performance. 

2) Server Compression: In this scenario, files are stored in 
the mass storage system in uncompressed form (in the 
disk cache). After the file has been unreferenced for a 
certain amount of time, it is compressed and remains 
in the disk cache (this is done during off-peak peri- 
ods). The migration algorithm migrates the file in 
compressed form to the tapes. A get request that finds 
a file in the cache may find the file in compressed or 
uncompressed form. If the file is compressed, it has to 
be decompressed before it is transferred to the client. 
Under this scenario, compression is not relevant, since 
it is done during off-peak periods. Decompression, 
however, has to be taken into account. To obtain the 
parameters for the decompression time at the server, 
we used the UNIX compress utility [23). which is 
based on a variation of the Ziv-Lempel sliding win- 
dow, directory-based, data compression algorithm 
[24], and measured the decompression times for vari- 
ous file sizes. Linear regression was applied to gener- 
ate the following function that gives the decompres- 
sion time at the server (Convex): 
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Fig. 8. Transfer time (in seconds) of get classes vs. hit ratio. 
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Fig. 9. Disk cache and tape system servk^e demands vs. Nt ratk>. 



= L14 * Biesize- 0.058 in sec, (14) 

where filesize is expressed in megabytes and the de- 
compression time in seconds. 

In this scenario, we need to add two additional pa- 
rameters to the model: the compression ratio and 
the probability P^c that a file is compressed given that 
it is foimd in the cache. The SCTvice demands for the 
get classes at the CPU (1). and physical disk devices 
(3) were recomputed using the following equation: 

where is the equation with the original file sizes 



and is the equation with the compressed file 

sizes. For the TLI devices (12), the equation is the 
same as in the original model, but uses the com- 
pressed file sizes.- The service demands for the put 
classes are not affected in this scenario, since files are 
stored in the mass storage system in uncompressed 
form. 

To assess the merit of the compression schemes, we de- 
fined a metric called gain, defined as: 



c, = ioox(r,-7;)/T,. 



(16) 



where is the file transfer time for class r without com- 
pression and is the transfer time for class r with the 

compression scheme. stands for the percent reduction in 
transfer time. 

Fig, 10 shows the gain Cr as a function of the compres- 
sion ratio /„. for all four get classes under client compres- 
sion. Larger classes benefit more from client compression, 
since a large fraction of the service demand imposed by 
these classes on the system depends on the file size rather 
than constant factors, such as mount delays. For class g4, a 
realistically attainable compression ratio of 30 percent can 
offer a performance gain of 23 jDercent. On the other hand, 
smaller classes do not gain much from compression. Put 
classes exhibit similar behavior as get classes, although 
their gain is greater than that of get classes for the same 
compression ratio. The reason for this is that put classes are 
only aff^ected by the disk cache whose response time is a 
function of the file size, whereas get classes are also affected 
by constant delays at the tape subsystem. 

The merits of server compression are explored in Fig. 1 1 , 
which shows the gain G^ for class g4 as a function of the 
compression ratio 4- for four different values of P^- (the prob- 
ability that the file is found in compressed form at the disk 
cache). Decompression at the server imposes a considerable 
load on the CPU that cannot be offset by the decreased file 
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Fig. 10. Client compression gain vs. f(yfor all classes. 
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Fig. 11 . Server compression gain vs. for class g^. 

size for large values of P^r This method can be beneficial only 
if the value of Pq. is kept below 25 percent for an average 
compression ratio of at least 30 percent. In order for the value 
of Pec to remain below 25 percent, the amount of time that the 
file resides in the disk cache in compressed form before being 
migrated should be adjusted by appropriate system software. 
The other get classes also exhibit similar behavior, whereas 
put classes are not affected by this scenario. 

5.3 Analysis of File Abstraction 

Abstraction files are reduced versions of original files used 
for browsing purposes |ll. According to |15], most NASA 
image data can be significantly compressed (i.e., by a factor 
of 20 or more) without significant loss in the visual presen- 
tation of the data, using lossy techniques, such as wavelet 
encoding and vector quantization. By servicing a number of 
requests with abstraction files, rather than the complete 
copies, the load imposed on the tape devices of a mass stor- 
age system can be reduced considerably, thereby reducing 
the transfer time experienced by most users. In order to 
evaluate the effect of file abstraction on our model, four 
more get classes were added to represent the requests 
serviced by abstraction files. Two new parameters, P5 and 
fa. have to be defined. P^, is the probability that an access can 
be satisfied by an abstract version of the object and 4, the 
file abstraction factor, is the factor by which a file is ab- 
stracted. So, an fg = 1% implies that the abstract file is one 
percent of the original file in size. The service demands for 
the original get classes remain as before. The service de- 
mands for the new abstract classes are computed using the 
same equations used for the original get classes, but with 
smaller file sizes to represent the abstracted form of the 
files. The load factor (MJ for the original classes is equal to 
its original value multiplied by (1 - PJ, whOe for the 
abstraction classes is equal to the original value times P^. 

To assess the merit of the abstraction classes of requests 
we defined a new metric called gain, Gp defined as 



G 

a 

i 

n 



80 




1 1 1 




70 




/ = 40% -©— 








= 10% -4- 




























30 








20 








10 








0! 




1 1 1 


1 



0.0 0.2 
Fig. 12. Abstraction gain vs. P^fbr class 



0.4 0.6 
Pb 



0.8 



1.0 



TABLE 9 

Abstraction Gain for Get Classes at = 10 Percent 



P5 


5i 


92 


ff3 


9a 


0.1 


0.17 


2.17 


5.33 


7.75 


0.3 


0.49 


6.26 


15.39 


22.36 


0.5 


0.79 


10.15 


24.96 


36.27 


0.7 


1.08 


13.86 


34.07 


49.51 


0.9 


1.36 


17.39 


42.75 


62.13 



G, = ioox(t,-7;»)/t,. 



(17) 



where is the transfer time of the original get class r with- 
out abstraction and 7j^, the transfer time under the abstrac- 
tion scenario, which is computed as 

r^^{\~p^)Tf^ + pj^^. 

where is the transfer time of the original get class r and 

7^*^ is the transfer time of the corresponding abstraction 

class. Note that both Tf^ and 7;"^ are computed under the 

abstraction scenario. The gain C,- stands for the percent re- 
duction in transfer time. 

Fig. 12 shows the gain for class as a function of the 
browse probability P^ for three values of the abstraction 
ratio As expected, the gain increases with the browse 
probability and with the compression ratio. For this class, 
reasonable gains can be achieved, even for not large vedues 
of the browse probability. For example, for a browse prob- 
ability of 40 percent and an abstraction ratio of one percent 
(not unrealistic for image files), a reduction of 32 percent in 
transfer time can be obtained. The gain decreases for small 
file classes. Table 9 shows the gain for classes through 
£ind for different values of Pfy As can be seen from the table, 
for fa = 10%, the maximum gain for the small file class gj is 
1.4 percent, while, for the large file class, g^ is 62.1 percent. 
From these figures, it is clear that file abstraction seems to 
be a powerful concept to be implemented for large files that 
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compress well and are likely to be browsed several times 
before they are actually retrieved. Images are a good exam- 
ple of this. 

6 Concluding Remarks 

In this paper, we have developed an approximate closed 
queuing network model of the Unitree mass storage system 
used at NASA's Center for Computational Sciences. The 
approximations were validated by discrete even simulation 
and the complete model was calibrated and validated with 
measurements taken at the NASA Goddard Space Flight 
Center Unitree Mass Storage System. The model was then 
used to analyze the performance of the mass storage system 
with respect to interactive requests for retrieving and stor- 
ing files. Eight classes were used as the workload, in order 
to accurately represent the different file sizes present in the 
get and put requests. The model was then used to study the 
performance of the system for various hit ratios at the disk 
cache, to predict the performance as the load increases, and 
to investigate the use of compression and file abstraction. 
The major contributions of the paper can be summarized as 
follows: 

• A complete workload characterization of a mass storage 
system at a large scientific installation (NCCS — NASA's 
Center for Computational Sciences). The workload 
chciracterization showed that most files retrieved are 
small (1.2 MB), while large files represent a small 
fraction of the number of retrieved files. The same be- 
havior was observed for storage requests. 

• A validated queuing network model that can be used 
by managers of mass storage system sites to carry out 
capacity planning studies and support procurement 
decisions of expensive storage devices, such as tape si- 
los. Workload forecast studies done by the Computer 
Environments and Research Requirements Committee 
predict a five-fold increase in workload intensity over 
the next two years to support NASA's earth and space 
science research. The model was able to show that, 
under the predicted workload, the storage time of 
large files will increase by almost 50 percent and the 
retrieval time of large files will increase by close to 30 
fjercent. 

• A novel and accurate MVA-based approximation for 
modeling the RAID disks that compose the disk cache 
at this mass storage system. We prop>osed a modifica- 
tion to the MVA response time equation and we 
showed, through simulation, that this modification 
accurately models the fork-join synchronization pres- 
ent in servicing requests by the striped device. 

• Examples on how the complete model could be used 
to investigate the impact of compression and file ab- 
straction techniques on the performance of mass stor- 
age systems. Client compression appears to be more 
beneficial, since it can offer considerable performance 
gains with a compression ratio of only 30 p>ercent. 
Server compression can be beneficial only if the prob- 
ability that the file is found in the disk cache in com- 
pressed form is kept below 25 f)ercent for an average 
compression ratio of at least 30 f>ercent. The model 



showed that file abstraction is a powerful concept to 
be implemented for large files that compress well and 
are likely to be browsed several times before they are 
actually retrieved. In fact, gains of the order of 62 per- 
cent can be' achieved for large files for abstraction 
factors of 10 percent (not unrealistic for image files). 

The model presented in this paf>er can also be used to assess 
design alternatives for mass storage systems, as well as to 
analyze the impact of using devices that incorporate faster 
technologies at the various levels of the storage hierarchy. 
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